Trulli
https://rstudio.com

Overview

In this practical you’ll practice plotting data with the ggplot2 package.

Packages

Package Installation
tidyverse install.packages("tidyverse")
ggthemes install.packages("ggthemes")
skimr install.packages("skimr")

Examples

  • The following examples will take you through the steps of creating both simple and complex plots with ggplot2. Try to go through each line of code and see how it works!
# -----------------------------------------------
# Examples of using ggplot2 on the mpg data
# ------------------------------------------------

library(tidyverse)         # Load tidyverse (which contains ggplot2!)

mpg # Look at the mpg data

# Just a blank space without any aesthetic mappings
ggplot(data = mpg)

# Now add a mapping where engine displacement (displ) and highway miles per gallon (hwy) are mapped to the x and y aesthetics
ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy))   # Map displ to x-axis and hwy to y-axis

#  Add points with geom_point()
ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy)) +
       geom_point()     

#  Add points with geom_count()
ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy)) +
       geom_count()   

# Again, but with some additional arguments
# Also using a new theme temporarily

ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy)) +
       geom_point(col = "red",                  # Red points
                  size = 3,                     # Larger size
                  alpha = .5,                   # Transparent points
                  position = "jitter") +        # Jitter the points         
         scale_x_continuous(limits = c(1, 15)) +  # Axis limits
         scale_y_continuous(limits = c(0, 50)) +
  theme_minimal()


# Assign class to the color aesthetic and add labels with labs()

ggplot(data = mpg, 
  mapping = aes(x = displ, y = hwy, col = class)) +  # Change color based on class column
  geom_point(size = 3, position = 'jitter') +
  labs(x = "Engine Displacement in Liters",
       y = "Highway miles per gallon",
       title = "MPG data",
       subtitle = "Cars with higher engine displacement tend to have lower highway mpg",
       caption = "Source: mpg data in ggplot2")
  

# Add a regression line for each class

ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy, color = class)) +
  geom_point(size = 3, alpha = .9) + 
  geom_smooth(method = "lm")

# Add a regression line for all classes

ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy, color = class)) +
  geom_point(size = 3, alpha = .9) + 
  geom_smooth(col = "blue", method = "lm")


# Facet by class
ggplot(data = mpg,
       mapping = aes(x = displ, 
                     y = hwy, 
                     color = factor(cyl))) + 
  geom_point() +
  facet_wrap(~ class) 


# Another fancier example

ggplot(data = mpg, 
       mapping = aes(x = cty, y = hwy)) + 
       geom_count(aes(color = manufacturer)) +     # Add count geom (see ?geom_count)
       geom_smooth() +                   # smoothed line without confidence interval
       geom_text(data = filter(mpg, cty > 25), 
                 aes(x = cty,y = hwy, 
                     label = rownames(filter(mpg, cty > 25))),
                     position = position_nudge(y = -1), 
                                check_overlap = TRUE, 
                     size = 5) + 
       labs(x = "City miles per gallon", 
            y = "Highway miles per gallon",
            title = "City and Highway miles per gallon", 
            subtitle = "Numbers indicate cars with highway mpg > 25",
            caption = "Source: mpg data in ggplot2",
            color = "Manufacturer", 
            size = "Counts")

Tasks

Datasets

File Rows Columns
mcdonalds.csv 260 24

A - Setup

A1. Open your R project. It should already have the folders 0_Data and 1_Code. Make sure that the data files listed in the Datasets section above are in your 1_Data folder

A2. Open a new R script. At the top of the script, using comments, write your name and the date. Save it as a new file called plottingh_practical.R in the 2_Code folder.

A3. Using library() load the set of packages for this practical listed in the packages section above.

## NAME
## DATE
## Wrangling Practical

library(XX)     
library(XX)
#...

A4. For this practical, we’ll use the mcondalds data which contains nutrition information about items from McDonalds. Using the following template, load the data into R and store it as a new object called mcdonalds.

# Load mcdonalds.csv from the data folder in your working directory

mcdonalds <- read_csv(file = "XXX/XXX")

A6. Take a look at the first few rows of the dataset(s) by printing them to the console.

A7. Use the skim() function (from the skimr package) to get more details on the dataset(s).

B - Building a plot step-by-step

In this section, you’ll build the following plot step by step

B1. Using ggplot(), create the following blank plot using the data and mapping arguments (but no geom). Use calories for the x aesthetic and saturated_fat for the y aesthetic

ggplot(data = mcdonalds, 
       mapping = aes(x = XX, y = XX))

B1. Using geom_point(), add points to the plot

ggplot(data = mcdonalds, 
       mapping = aes(x = XX, y = XX)) +
  geom_point()

B2. Using the color aesthetic mapping, color the points by their Category.

ggplot(mcdonalds, aes(x = XX, y = XX, col = XX)) +
  geom_point() 

B3. Add a smoothed average line using geom_smooth().

ggplot(mcdonalds, aes(x = XX, y = XX, col = XX)) +
  geom_point() +
  geom_smooth() 

B3. Oops! Did you get several smoothed lines instead of just one? Let’s fix it by specifying that the line should have one color: “black”. When you do, you should then only see one line.

ggplot(mcdonalds, aes(x = XX, y = XX, col = XX)) +
  geom_point() +
  geom_smooth(col = "XX") 

B4. Add appropriate labels using the labs() function

ggplot(mcdonalds, aes(x = XX, y = XX, col = XX)) +
  geom_point() +
  geom_smooth(col = "XX") +
  labs(title = "XX",
       subtitle = "XX",
       caption = "XX")

B5. Set the limits of the x-axis to 0 and 1250 using xlim()

ggplot(mcdonalds, aes(x = XX, y = XX, col = XX)) +
  geom_point() +
  geom_smooth(col = "XX") +
  labs(title = "XX",
       subtitle = "XX",
       caption = "XX") +
  xlim(XX, XX)

B5. Finally, set the plotting theme to theme_minimal(). You should now have the final plot!

ggplot(mcdonalds, aes(x = XX, y = XX, col = XX)) +
  geom_point() +
  geom_smooth(col = "XX") +
  labs(title = "XX",
       subtitle = "XX",
       caption = "XX")+
  xlim(XX, XX) +
  theme_minimal()

C - Playing with geoms

C1. Create the following plot showing the relationship between menu category and calories

ggplot(data = mcdonalds, aes(x = XX, y = XX, fill = XX)) +
  geom_violin() +
  guides(fill = FALSE) +
  labs(title = "XX",
       subtitle = "XX")

C2. Include the additional argument + stat_summary(fun.y = "mean", geom = "point", col = "white", size = 4) to include points showing the mean of each distribution

C3. Now add + geom_jitter(width = .1, alpha = .5) to your plot, what do you see?

C4. Play around with your plotting arguments to see how the results change! Each time you make a change, run the plot again to see your new output!

- Change the summary function in `stat_summary()` from "mean" to "median"
- Change the size of the points in `stat_summary()` to something much biggger (or smaller).
- Change the `width` argument in `geom_jitter()` to `width = 0`
- Instead of using `geom_violin()`, try `geom_boxplot()`
- Remove the `fill = Category` aesthetic entirely.

D - Using facets

D1. Create the following plot showing the relationship between Sodium and calories

ggplot(XX, aes(x = XX, y = XX)) +
  geom_point(alpha = .2) +
  facet_wrap(~ XX) +
  labs(title = "XX",
       subtitle = "XX") +
  theme_minimal()

D2. Try the following ways to customise your plot

  • Color the points by their category
  • Add a smoothed line to each plot with geom_smooth()

E - Saving plots

E1.

E - Playing with themes

E1. the ggthemes package has many additional plotting themese. Look at the help menu for the ggthemes package to see all of the themes.

E2. Adjust some of your previous plots using the theme_excel() theme to see a really ugly Excel-like plot!

Saving plots as objects

  1. Create the following plot from the mpg dataset, and save it as an object called myplot

  1. Now, using object assignment <- add a regression line to the myplot object with geom_smooth(). Then evaluate the object to see the updated version. It should now look like this:

  1. Using ggsave(), save the object as a pdf file called myplot.pdf in your 3_Figures folder. Set the width to 6 inches, and the height to 4 inches. Open the pdf outside of RStudio to make sure it worked!

Demographic information of midwest counties in the US

  1. Print the midwest dataset (it’s contained in ggplot2) and look at the help menu to see what values it contains. It should look like this:
# A tibble: 437 x 28
     PID county    state  area poptotal popdensity popwhite popblack
   <int> <chr>     <chr> <dbl>    <int>      <dbl>    <int>    <int>
 1   561 ADAMS     IL    0.052    66090      1271.    63917     1702
 2   562 ALEXANDER IL    0.014    10626       759      7054     3496
 3   563 BOND      IL    0.022    14991       681.    14477      429
 4   564 BOONE     IL    0.017    30806      1812.    29344      127
 5   565 BROWN     IL    0.018     5836       324.     5264      547
 6   566 BUREAU    IL    0.05     35688       714.    35157       50
 7   567 CALHOUN   IL    0.017     5322       313.     5298        1
 8   568 CARROLL   IL    0.027    16805       622.    16519      111
 9   569 CASS      IL    0.024    13437       560.    13384       16
10   570 CHAMPAIGN IL    0.058   173025      2983.   146506    16559
# ... with 427 more rows, and 20 more variables: popamerindian <int>,
#   popasian <int>, popother <int>, percwhite <dbl>, percblack <dbl>,
#   percamerindan <dbl>, percasian <dbl>, percother <dbl>,
#   popadults <int>, perchsd <dbl>, percollege <dbl>, percprof <dbl>,
#   poppovertyknown <int>, percpovertyknown <dbl>, percbelowpoverty <dbl>,
#   percchildbelowpovert <dbl>, percadultpoverty <dbl>,
#   percelderlypoverty <dbl>, inmetro <int>, category <chr>
  1. Using the following code as a template, create the following plot showing the relationship between college education and poverty
ggplot(data = XX, 
    mapping = aes(x = XX, y = XX)) + 
    geom_point(aes(fill = XX, size = XX), shape = 21, color = "white") + 
    geom_smooth(aes(x = XX, y = XX)) +
    labs(
        x = "XX", 
        y = "XX", 
        title = "XX",
        subtitle = "XX",
        caption = "XX") + 
    scale_color_brewer(palette = "XX") + 
    scale_size(range = c(XX, XX)) +
    guides(size = guide_legend(override.aes = list(col = "black")), 
           fill = guide_legend(override.aes = list(size = 5)))

  1. Create the following density plot showing the density of inhabitants with a college education in different states using the following template
ggplot(data = XX, 
       mapping = aes(XX, fill = XX)) + 
  geom_density(alpha = XX) + 
  labs(title = "XX", 
       subtitle = "XX",
       caption = "XX",
       x = "XX",
       y = "XX",
       fill = "XX")

Heatplots with geom_tile()

  1. You can create heatplots using the geom_tile() function. Try creating the following heatplot of statistics of NBA players using the following template:
# Read in nba data
nba_long <- read_csv("https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_data/nba_long.csv")

# Look at the data
nba_long

ggplot(XX, 
       mapping = aes(x = XX, y = XX, fill = XX)) + 
  geom_tile(colour = "XX") + 
  scale_fill_gradientn(colors = c("XX", "XX", "XX"))+ 
  labs(x = "XX", 
       y = "XX", 
       fill = "XX", 
       title = "NBA XX performance",
       subtitle = "XX",
       caption = "XX") +
  coord_flip()

  1. Make the following plot of savings data (psavert) from the economics dataset.

  1. Make the following plot from the trial_act.csv dataset. To do this, you’ll need to use both geom_boxplot() and geom_point(). To jitter the points, use the position argument to geom_point(), as well as the position_jitter() function to control how much to jitter the points.

References and Further Reading